This tutorial follows from the Section 2. “Modeling considerations when estimating CDD” of the main text:
Section 2. “Modeling considerations when estimating CDD”
For survival and growth models, defining the density metric and the size of local neighborhoods also require consideration. The density of conspecific and heterospecific neighbors surrounding a focal individual (or plot) can be measured within different distances (i.e., radii). Within a radius, neighbors should be weighted in the density summation with regard to their size and distance to the focal individuals, with the rationale that neighbors at a greater distance and of smaller size have smaller effects. Typically, weighting involves dividing the basal area or diameter by a linear function or an exponential function of distance to allow competitive effects of an individual of a given size to decline with distance (Uriarte et al. 2010; Canham et al. 2006). Biologically meaningful radii should be tested based on the size/life stage, vital rate of interest, and the likely agents of density dependence in the system. For a system of interest, we suggest comparing models with multiple distances (e.g., using maximum likelihood or information criterion) because forests may differ in how neighbors influence performance.
Here, we demonstrate how to calculate the density of conspecific, heterospecific, and all neighbors surrounding a focal individual or plot for a given distance when XY coordinates are known. As a result, this section requires that the data set contains the location of mapped stems. If that is not available (as is common in seedling studies or smaller plots), continue to part 2 of this appendix.
We then demonstrate how to weight the calculation of neighborhood density by individual size (i.e., basal area) and distance using an exponential decay function (one of many possible decay functions), allowing the competitive effects of density effects to saturate (Uriarte et al. 2010; Canham et al. 2006).
To assess which shape parameters of the exponential decay function is most appropriate for the data set, we fit models with multiple combinations of decay function values and compare models using log likelihood.
From a computational perspective, this approach can be relatively resource intensive both in terms of time and object size. It’s possible to make this approach more efficient by subdividing the data (e.g., by plot) or using more efficient data structures such as data.table.
We also note that alternative approaches allow the estimation of the effective scale of neighborhood interactions directly from data (Barber et al. 2022). An excellent case study using Stan is available here.
# Load librarieslibrary(tidyr)library(dplyr)library(ggplot2)library(parallel)library(here)library(spatstat.geom)library(mgcv) # For fitting gamslibrary(lubridate) # For calculating census intervalslibrary(broom) # For processing fit modelslibrary(purrr)library(kableExtra) # For table styling
2.3 Data format explanation
For this tutorial, we will be using an example data set from Barro Colorado Island (BCI) (available here) that includes 7,028 observations, of 3,771 individuals of 16 species across two census intervals with the 50 ha BCI 50 ha forest dynamics plot (more information here). Each stem is individually mapped, which allows us to calculate neighborhood density across different distance thresholds.
The code below assumes the data is in a format where each row is an observation for an individual from a census. For this data set, the column descriptions are as follows:
treeID: unique identifier for each tree
sp: species code
gx: spatial coordinate on x axis
gy: spatial coordinate on y axis
dbh: diameter at breast height (mm)
ba: basal area (m^2)
status: status at census, A = alive, D = dead
date: date of observation
census: census number
surv: survival status at census, 1 = alive, 0 = dead
surv_next: survival status at next census, 1 = alive, 0 = dead
mort: mortality status at census, 1 = dead, 0 = alive
mort_next: mortality status at next census, 1 = dead, 0 = alive
interval: time interval between censuses in years
Let’s take a quick look at the data set we’ll be working with:
Code
head(dat, n =5)
# A tibble: 5 × 14
treeID sp gx gy dbh ba status date surv surv_next mort
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <date> <dbl> <dbl> <dbl>
1 3884 ceibpe 296. 24.8 1500 1.77 A 1981-11-10 1 1 0
2 3998 cordbi 280. 45.4 281 0.0620 A 1981-12-11 1 1 0
3 4065 ceibpe 289. 266. 2000 3.14 A 1982-01-08 1 1 0
4 4070 cordbi 283. 287. 356 0.0995 A 1982-01-08 1 1 0
5 4119 ceibpe 258. 224. 1600 2.01 A 1981-12-13 1 1 0
# ℹ 3 more variables: mort_next <dbl>, interval <dbl>, census <dbl>
We can produce a plot Figure 2.1 of the tree locations, where the size of the point is scaled to basal area and colored by species, with each census displayed as a panel:
Code
ggplot(dat, aes(x = gx, y = gy, size = ba, col = sp)) +geom_point() +facet_wrap(~census) +theme_bw(10) +theme(legend.position ="right") +labs(size ="Basal area", col ="Species")
Figure 2.1: Map of tree locations by census
2.4 Define exponential decay function
We will demonstrate how to calculate neighborhood densities using an exponential decay function. In principle, it’s possible to use any number of different decay functions and to explore various ranges of values that determine the rate of decay. Here, we present the general framework for how one might explore different scenarios, though the ultimate choice of decay function and parameter values will depend on the study. Note that this method of calculating neighborhood density is only possible when neighbors’ x-y coordinates are known.
Let’s see what the exponential decay function looks like across a range of mu values (Figure 2.2):
Code
# Set range of mu values and distancesdecay_values <-seq(from =1, to =25, by =2)# Use sprintf to add leading 0s, will help with sorting later ondecay_names =paste("exp", sprintf("%02s", decay_values), sep ="") distances <-seq(1, 100, 1)# Generate a dataframe with each combination of mu and distanceexample_decay_values <-expand_grid(decay_values, distances) %>%# Rename columnsrename(decay_value = decay_values, distance = distances)# Evaluate distance decay function for each combination of # mu and distanceexample_decay_values$decay <-exponential_decay(mu = example_decay_values$decay_value,distance = example_decay_values$distance)# Plot resultsggplot(example_decay_values, aes(x = distance, y = decay, color = decay_value, group = decay_value)) +ylab("Density") +xlab("Distance (m)") +scale_color_continuous(name ="Value of \ndecay constant") +geom_line() +theme_bw(12)
Figure 2.2: Plot of decay function
2.4.1 Determine which trees are at edge of plot
Trees near the edge of plot boundaries have incomplete information about their neighborhood, because trees outside of the plot boundaries are not mapped. Typically trees within certain distance threshold from the plot edge are excluded from analysis, but still included in calculations of neighborhood densities.
We are going to add a column to our data set called ‘edge’ that is TRUE if within a set distance to the edge of the plot.
For this example data set, the dimensions of the overall plot are 300 x 300 m, ranging from 0-300 on both the x and y axis, and representing a subset of the overall 50 ha forest dynamics plot at BCI.
Code
# Set threshold for distance to edgedistance_threshold_edge =30# Add in min and max values for corners of plotmin_x <-0max_x <-300min_y <-0max_y <-300dat$edge = dat$gx < min_x + distance_threshold_edge | dat$gx > max_x - distance_threshold_edge | dat$gy < min_y + distance_threshold_edge | dat$gy > max_y - distance_threshold_edge# How many trees fall within the edge threshold?table(dat$edge)
FALSE TRUE
4526 2502
Below is a plot Figure 2.3 of tree locations colored by whether they fall within the edge threshold or not, separated out for each census.
Code
ggplot(dat, aes(x = gx, y = gy, col = edge)) +geom_point() +facet_wrap(~census) +coord_fixed(ratio =1) +theme_bw(12)
Figure 2.3: Map of tree locations colored by whether they fall within the edge threshold
2.5 Calculate distances among focal individual and neighbors
Next, we will calculate distances among individuals in our plot, using an upper threshold distance of what to consider a neighbor.
Because this example only includes one census interval (two censuses total), we will subset down to just the first census to calculate neighborhood density.
If the data set were to contain multiple census intervals, it would be necessary to calculate neighborhood density separately for each census interval, using only the individuals that were alive at the beginning of that census interval.
Code
# Set distance threshold for considering neighborsdistance_threshold_neighborhood <-30# Subset to first census - this will be different for different # datasetsdat_first_census <- dat %>%filter(census ==1)# Format into 'ppp' objectdat_ppp = spatstat.geom::ppp(dat_first_census$gx, dat_first_census$gy, window =owin(range(dat$gx), range(dat$gy)), checkdup = F)# Determine close pairs based on distance threshold# Returns a list that we convert to tibble laterneighbors = spatstat.geom::closepairs(dat_ppp, rmax = distance_threshold_neighborhood, # Max radiuswhat ="ijd", # return indicies i, j, and distance twice =TRUE)# Convert to dataframeneighbors <-as_tibble(neighbors)# Take a peek at the data# i = index of focal individual# j = index of neighbor# d = distance to neighborhead(neighbors)
Next we add in additional columns for neighbor characteristic, (e.g., species, size) and whether they are located near the edge of the plot (blue area in Figure 2.3).
Code
# add additional columns# Add species for individual ineighbors$sp_i = dat_first_census$sp[neighbors$i] # Add whether individual i is near edgeneighbors$edge_i = dat_first_census$edge[neighbors$i] # Add species for individual jneighbors$sp_j = dat_first_census$sp[neighbors$j] # Add basal area of individual jneighbors$ba_j = dat_first_census$ba[neighbors$j]
We then want to add a column that indicates whether the comparison between the focal individual and the neighbor is conspecific or heterospecific because we are interested separately estimating the densities of conspecifics and heterospecifics.
We then remove focal trees that are too close to the edge of the plot
Code
# remove focal trees i that are at the edgeneighbors = neighbors[!neighbors$edge_i, ]
Next, we add columns to our neighbors data set that indicates the distance decay multiplier and the distance decay multiplier weighted by basal area
Code
# Loop through distance decay valuesfor(x in1:length(decay_values)){# Add in column for distance decay multiplier for each decay value# add _ba suffix to column name - will eventually be summed based on# number of individual neighbors neighbors[, paste0(decay_names[x], "_N")] <-exponential_decay(mu = decay_values[x], distance = neighbors$d)# Weight distance decay multiplier by basal area of neighbor# add _ba suffix to column name neighbors[, paste0(decay_names[x], "_BA")] <-exponential_decay(mu = decay_values[x], distance = neighbors$d) * neighbors$ba_j}
Depending on how many distance decay values are being investigated, there may be many columns in the data frame.
Then we summarize neighborhood density for each focal tree separately for conspecifics and heterospecifics.
Code
# Simple calculations of number of neighbors and total basal area, # ignoring distance decayneighbors_summary <- neighbors %>%group_by(i, comparison_type) %>%summarise(nodecay_N =n(), # count of neighborsnodecay_BA =sum(ba_j)) # sum of basal area)# Add in decay columnsneighbors_summary_decay <- neighbors %>%group_by(i, comparison_type) %>%# Select only columns related to distance decayselect(starts_with("exp")) %>%# Summarize them all by summing columnssummarise_all(sum) # Join both togetherneighbors_summary <-left_join(neighbors_summary, neighbors_summary_decay,by =c("i", "comparison_type"))# Add treeID columnneighbors_summary$treeID<-dat_first_census$treeID[neighbors_summary$i]# If there are any focal individuals with no neighbors, add values # of 0 for neighborhood densities noNeighbors = dat_first_census$treeID[!dat_first_census$treeID %in% neighbors_summary$treeID &!dat_first_census$edge]# If there are individuals with no neighborsif (length(noNeighbors) >0) { neighbors_summary =bind_rows(neighbors_summary, expand_grid(i =NA, treeID = noNeighbors, comparison_type =c("het", "cons"))) %>%# Add 0s where NAmutate_all(replace_na, replace =0) }# Take a peak at the datahead(neighbors_summary)
As described in the main text, it can be advantageous to use total density that includes both conspecific and heterospecific density as a covariate, rather than only heterospecific density.
Here, we calculate overall density by summing heterospecific and conspecific densities.
Code
# First convert to long format which will make it easy to sum across # heterospecific and conspecific valuesneighbors_summary_long_format <- neighbors_summary %>%pivot_longer(cols =-c("i", "comparison_type", "treeID"))# Sum across heterospecific and conspecific values and rename to 'all'neighbors_total_long_format <- neighbors_summary_long_format %>%group_by(i, treeID, name) %>%summarize(value =sum(value)) %>%mutate(comparison_type ="all")# Bind together conspecific and 'all' densities# remove heterospecific columns# fill in 0s where there are no neighborsneighbors_summary_total =bind_rows(neighbors_summary_long_format, neighbors_total_long_format) %>%# Can filter out heterospecific neighborhood # values by uncommenting this line of the# objects become too large# filter(comparison_type != "het") %>% mutate(name =paste0(comparison_type, "_", name)) %>%select(-comparison_type) %>%pivot_wider(names_from = name, values_from = value, values_fill =0)
2.7 Model mortality as a function of neighborhood density
To determine the ‘best’ decay parameter to use, we fit species-specific mortality models using Generalized Additive Models GAMs and compare models based on log likelihoods. GAMs are flexible statistical models that combine linear and non-linear components to capture complex relationships in data.
We first create our data set that we will use in the GAMs, subsetting down to just one census interval (because our example dataset only has 2 censuses) and removing trees close to the edge. In other datasets, you may have multiple census intervals, where it would be common practice to include ‘year’ or ‘census’ as a random effect in the model, but otherwise the overall approach is similar.
Code
# Join census data with neighborhood datadat_gam <-left_join(dat_first_census, neighbors_summary_total,by ="treeID")# Remove edge treesdat_gam <- dat_gam %>%filter(edge ==FALSE)
For each species, we summarize data availability to help determine parameters for the GAM smooth terms.
Code
# Summarize data availability at species level to set the degree of # smoothness for GAM smooth termssp_summary <- dat_gam %>%group_by(sp) %>%summarise(ndead =sum(mort_next),nsurv =sum(surv_next),range_con_BA =max(con_nodecay_BA) -min(con_nodecay_BA),max_con_BA =max(con_nodecay_BA),unique_con_BA =length(unique(con_nodecay_BA)),unique_all_BA =length(unique(all_nodecay_BA)),range_con_N =max(con_nodecay_N) -min(con_nodecay_N),max_con_N =max(con_nodecay_N),unique_con_N =length(unique(con_nodecay_N)),unique_all_N =length(unique(all_nodecay_N)),unique_dbh =length(unique(dbh)) )# Filter out species that have 0 or 100% mortalitysp_summary <- sp_summary %>%filter(ndead >0) %>%filter(nsurv >0)
In this long block of code, we loop over all possible combinations of decay values for neighborhood densities weighted by abundance (N) and size (BA) for each species and fit a separate GAM for each model. For each GAM, we assess whether the model was able to be fit, and if it was able to be fit, whether it converged and for potential warnings. We save the results of successful model fits into a list that we will process later.
For large datasets where individual GAMs take a long time to run, the code could be modified to run in parallel, either locally on a personal computer or across a computing cluster.
Code
# Initialize list that will save model outputsres_mod <-list()# Model run settingsrun_settings <-expand_grid(species =unique(sp_summary$sp),decay_con =c("nodecay", decay_names),decay_total =c("nodecay", decay_names),nhood_data_type =c("N", "BA"))# Loop through model run settingsfor(run_settings_row in1:nrow(run_settings)){# Extract values from run settings dataframe species <- run_settings$species[run_settings_row] decay_con <- run_settings$decay_con[run_settings_row] decay_total <- run_settings$decay_total[run_settings_row] nhood_data_type <- run_settings$nhood_data_type[run_settings_row]# Subset down to just focal species dat_subset <- dat_gam %>%filter(sp == species)# Set run name run_name <-paste0(species, "_total", decay_total,"_con", decay_con, "_", nhood_data_type)# Print status if desired# cat("Working on run: ", run_name, " ...\n")# Create model formula# Initial DBH included as predictor variable form =paste0("mort_next ~ s(dbh, k = k1) + s(all_", decay_total, "_", nhood_data_type, ", k = k2) + s(con_", decay_con, "_", nhood_data_type, ", k = k3)")# Convert into formula form <-as.formula(form)# Choose penalties for model fitting# set to default 10 (the same as -1)# The higher the value of k, the more flexible the smooth term becomes, allowing for more intricate and wiggly patterns. Conversely, lower values of k result in smoother and simpler representations. k1 = k2 = k3 =10if (k1 > sp_summary$unique_dbh[sp_summary$sp == species]) { k1 = sp_summary$unique_dbh[sp_summary$sp == species] -2 }if (k2 > sp_summary$unique_all_N[sp_summary$sp == species]) { k2 = sp_summary$unique_all_N [sp_summary$sp == species]-2 }if (k3 > sp_summary$unique_con_N[sp_summary$sp == species]) { k3 = sp_summary$unique_con_N[sp_summary$sp == species] -2 }# Fit model# wrap in a try function to catch any errors mod =try(gam(form,family =binomial(link=cloglog),offset =log(interval),data = dat_subset,method ="REML"), silent = T)# Check if model was able to fitif (!any(class(mod) =="gam")) {# print(paste("gam failed for:", run_name)) } else {# Check if gam convergedif (!mod$converged) {# print(paste("no convergence for:", run_name)) } else {# check for complete separation# https://stats.stackexchange.com/questions/336424/issue-with-complete-separation-in-logistic-regression-in-r# Explore warning "glm.fit: fitted probabilities numerically 0 # or 1 occurred" eps <-10* .Machine$double.eps glm0.resids <-augment(x = mod) %>%mutate(p =1/ (1+exp(-.fitted)),warning = p >1-eps,influence =order(.hat, decreasing = T)) infl_limit =round(nrow(glm0.resids)/10, 0)# check if none of the warnings is among the 10% most # influential observations, than it is okay.. num =any(glm0.resids$warning & glm0.resids$influence < infl_limit)# complete separationif (num) {# print(paste("complete separation is likely for:", run_name)) } else {# missing Vcif (is.null(mod$Vc)) {# print(paste("Vc not available for:", run_name)) } else {# Add resulting model to list if it passes all checks res_mod[[run_name]] <- mod } # Vc ifelse } # complete separation ifelse } # convergence ifelse } # model available ifelse} # end run settings loop
2.8 Summarize model fits
Next, we will extract regression coefficients into a dataframe using broom::tidy()
Code
# Extract coefficients for each model into a listcoefs =lapply(res_mod, broom::tidy) # Add a column for model run to each object in the listcoefs =Map(cbind, coefs, run_name =names(coefs)) coefs =do.call(rbind, coefs) # Bind elements of list together by rowsrownames(coefs) <-NULL# Remove row namescoefs <- coefs %>%select(run_name, everything()) # Rearrange columns# Take a look at the dataknitr::kable(head(coefs), digits =2, booktabs = T) %>%kable_styling(latex_options =c("striped", "scale_down"))
run_name
term
edf
ref.df
statistic
p.value
cappfr_totalnodecay_connodecay_N
s(dbh)
1.00
1.00
2.46
0.12
cappfr_totalnodecay_connodecay_N
s(all_nodecay_N)
1.23
1.43
1.25
0.31
cappfr_totalnodecay_connodecay_N
s(con_nodecay_N)
1.00
1.00
1.84
0.17
cappfr_totalnodecay_connodecay_BA
s(dbh)
4.36
5.31
9.83
0.09
cappfr_totalnodecay_connodecay_BA
s(all_nodecay_BA)
1.00
1.00
0.36
0.55
cappfr_totalnodecay_connodecay_BA
s(con_nodecay_BA)
1.00
1.00
3.04
0.08
Next we will extract model summaries for each model with broom::glance() that provides key information like degrees of freedom, log likelihood, AIC, etc.
Code
# Extract summaries for each model into a listsums =lapply(res_mod, broom::glance) # Add a column for model run to each object in the listsums =Map(cbind, sums, run_name =names(sums)) sums =do.call(rbind, sums) # Bind elements of list together by rowsrownames(sums) <-NULL# Remove row names# Separate run name into columns for species, decay, and density typesums <- sums %>%separate(run_name, into =c("sp", "decay_total", "decay_con", "density_type"), remove =FALSE)# Remove 'total' and 'con' from decay columnssums$decay_total <-gsub("total", "", sums$decay_total)sums$decay_con <-gsub("con", "", sums$decay_con)# Rearrange columns sums <- sums %>%select(run_name, sp, decay_total, decay_con, density_type, everything()) # Take a look at the model summariesknitr::kable(head(sums), digits =2) %>%kable_styling(latex_options =c("striped", "scale_down"))
run_name
sp
decay_total
decay_con
density_type
df
logLik
AIC
BIC
deviance
df.residual
nobs
cappfr_totalnodecay_connodecay_N
cappfr
nodecay
nodecay
N
4.23
-22.62
54.10
70.37
45.23
285.77
290
cappfr_totalnodecay_connodecay_BA
cappfr
nodecay
nodecay
BA
7.36
-17.21
51.03
81.52
34.41
282.64
290
cappfr_totalexp01_connodecay_N
cappfr
exp01
nodecay
N
5.06
-22.64
56.52
77.14
45.29
284.94
290
cappfr_totalexp01_connodecay_BA
cappfr
exp01
nodecay
BA
8.08
-17.54
53.90
88.44
35.07
281.92
290
cappfr_totalexp03_connodecay_N
cappfr
exp03
nodecay
N
8.07
-17.87
54.98
90.30
35.73
281.93
290
cappfr_totalexp03_connodecay_BA
cappfr
exp03
nodecay
BA
7.93
-17.62
53.85
87.98
35.24
282.07
290
Due to limited sample sizes, it is likely that GAMs will not fit for each species. For example, in the table below of summary data of species where models did not successfully fit/converge, there are several instances of where all individuals of the species survived during the census, leading to no mortality events to use in the model.
We need to exclude species without complete model runs from our overall calculations when looking for optimal decay parameters across the entire data set.
Code
# Tally up number of model runs by species and total decay valuestable(sums$sp, sums$decay_total)
# get incomplete run-site-species combinationsrun_counts_by_sp <- sums %>%group_by(sp) %>%tally() %>%# Join with overall species listleft_join(sp_summary %>%select(sp), ., by ="sp") # Get expected number of runs if all models workexpected_total_runs <- run_settings %>%group_by(species) %>%tally() %>%pull(n) %>%max()# Save species names where they didn't have all expected combinations # of model runsincomplete = run_counts_by_sp$sp[run_counts_by_sp$n < expected_total_runs |is.na(run_counts_by_sp$n)]# Species with successful runsknitr::kable(sp_summary[!sp_summary$sp %in% incomplete, ], digits =2) %>%kable_styling(latex_options =c("striped", "scale_down"))
sp
ndead
nsurv
range_con_BA
max_con_BA
unique_con_BA
unique_all_BA
range_con_N
max_con_N
unique_con_N
unique_all_N
unique_dbh
cappfr
5
285
0.02
0.02
273
290
30
32
31
111
31
des2pa
174
1172
0.14
0.15
1335
1343
123
140
123
157
77
micone
38
28
0.00
0.00
48
66
12
12
13
55
11
stylst
3
101
0.02
0.02
91
104
13
13
14
68
30
Code
# Species without successful runsknitr::kable(sp_summary[sp_summary$sp %in% incomplete, ], digits =2) %>%kable_styling(latex_options =c("striped", "scale_down"))
sp
ndead
nsurv
range_con_BA
max_con_BA
unique_con_BA
unique_all_BA
range_con_N
max_con_N
unique_con_N
unique_all_N
unique_dbh
acalma
2
5
0.00
0.00
4
7
2
2
2
7
6
cordbi
7
36
0.25
0.25
36
43
10
10
11
38
28
pentma
12
21
0.00
0.00
20
33
6
6
6
31
11
sponra
3
15
0.08
0.08
12
18
5
5
4
17
16
2.9 Selecting optimum decay parameter values across all species
We then summarize different model criteria across all species runs. To look for the optimal value for decay parameters, we sum log likelihoods across all species for a given decay parameter combination and choose the resulting parameter combination with the highest summed log likelihood.
# A tibble: 392 × 6
# Groups: decay_total, decay_con [196]
decay_total decay_con density_type nvalues sumlogLik meanlogLik
<chr> <chr> <chr> <int> <dbl> <dbl>
1 exp01 exp01 BA 4 -556. -139.
2 exp01 exp01 N 4 -554. -138.
3 exp01 exp03 BA 4 -553. -138.
4 exp01 exp03 N 4 -558. -140.
5 exp01 exp05 BA 4 -554. -138.
6 exp01 exp05 N 4 -561. -140.
7 exp01 exp07 BA 4 -557. -139.
8 exp01 exp07 N 4 -563. -141.
9 exp01 exp09 BA 4 -558. -140.
10 exp01 exp09 N 4 -564. -141.
# ℹ 382 more rows
We create a heatmap plot Figure 2.4 of summed log likelihoods for all parameter combinations across all species, with the optimal parameter combination marked with an X
Code
# Find optimum value separately for N and BA optimum <- sums_total %>%group_by(density_type) %>%slice_max(sumlogLik)# Plot heatmap of log likelihood valuesggplot(sums_total, aes(x = decay_total, y = decay_con, fill = sumlogLik)) +geom_tile(width =0.9, height =0.9, col ="black") +scale_fill_viridis_c() +# viridis color palettegeom_label(data = optimum, label ="X") +labs(x ="Decay total density", y ="Decay conspecific density", fill ="sumlogLik") +facet_wrap(~density_type, ncol =1) +theme_bw(12) +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))
Figure 2.4: Heatmap of optimal values for decay constants
For this data set, the following are the optimal decay parameter values across all species separately for neighborhood density calculated by abundance (N) and by basal area (BA)
2.10 Selecting optimum decay parameter values separately for each species
Alternatively, it may be useful to determine optimum decay parameters on a species by species basis. To do this, we find the maximum log liklihood for each species separately across decay parameter combination and choose the resulting parameter combination with the highest log likelihood.
# A tibble: 1,568 × 7
# Groups: decay_total, decay_con, density_type [392]
decay_total decay_con density_type sp nvalues sumlogLik meanlogLik
<chr> <chr> <chr> <chr> <int> <dbl> <dbl>
1 exp01 exp01 BA cappfr 1 -16.6 -16.6
2 exp01 exp01 BA des2pa 1 -489. -489.
3 exp01 exp01 BA micone 1 -43.7 -43.7
4 exp01 exp01 BA stylst 1 -6.84 -6.84
5 exp01 exp01 N cappfr 1 -14.3 -14.3
6 exp01 exp01 N des2pa 1 -489. -489.
7 exp01 exp01 N micone 1 -41.3 -41.3
8 exp01 exp01 N stylst 1 -8.64 -8.64
9 exp01 exp03 BA cappfr 1 -20.7 -20.7
10 exp01 exp03 BA des2pa 1 -485. -485.
# ℹ 1,558 more rows
We create a heatmap plot Figure 2.5 of log likelihoods for all parameter combinations for each species separately, with the optimal parameter combination marked with an X. We only display the first three species here, because with data sets containing many species, it will be hard to visualize all the species on one graph and you may need to subdivide the plot further for visualization.
Code
sums <- sums %>%filter(sp %in%c("cappfr", "cordbi", "des2pa")) %>%group_by(sp) %>%# Scale loglikihood by species to help with visualizationmutate(logLik_scaled =scale(logLik)) %>%ungroup()# Find optimum value separately for N and BA optimum_by_sp <- sums %>%group_by(sp, density_type) %>%slice_max(logLik_scaled, with_ties =FALSE)# Plot heatmap of log likelihood valuesggplot(sums, aes(x = decay_total, y = decay_con,fill = logLik_scaled)) +geom_tile(width =0.9, height =0.9, col ="black") +geom_label(data = optimum_by_sp, label ="X") +labs(x ="Decay total density", y ="Decay conspecific density", fill ="logLik") +facet_wrap(~sp + density_type, ncol =2, scales ="free") +scale_fill_viridis_c() +# viridis color palettetheme_bw(12) +theme(legend.position ="right") +labs(fill ="Log likelihood\n(scaled)") +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))
Figure 2.5: Heatmap of optimal values for decay constants separately for each species
For this data set, the following are the optimal decay parameter values for each species separately for neighborhood density calculated by abundance (N) and by basal area (BA):
Barber, Cristina, Andrii Zaiats, Cara Applestein, Lisa Rosenthal, and T Trevor Caughlin. 2022. “Bayesian Models for Spatially Explicit Interactions Between Neighbouring Plants.”Methods in Ecology and Evolution.
Canham, Charles D, Michael J Papaik, Marı́a Uriarte, William H McWilliams, Jennifer C Jenkins, and Mark J Twery. 2006. “Neighborhood Analyses of Canopy Tree Competition Along Environmental Gradients in New England Forests.”Ecological Applications 16 (2): 540–54.
Uriarte, Marı́a, Nathan G Swenson, Robin L Chazdon, Liza S Comita, W John Kress, David Erickson, Jimena Forero-Montaña, Jess K Zimmerman, and Jill Thompson. 2010. “Trait Similarity, Shared Ancestry and the Structure of Neighbourhood Interactions in a Subtropical Wet Forest: Implications for Community Assembly.”Ecology Letters 13 (12): 1503–14.
---output: html_documentprefer-html: trueeditor_options: chunk_output_type: consoleexecute: cache: true---# Calculating neighborhood densities## OverviewThis tutorial follows from the Section 2. "Modeling considerations when estimating CDD" of the main text:::: callout-note## Section 2. "Modeling considerations when estimating CDD"For survival and growth models, defining the density metric and the size of local neighborhoods also require consideration. The density of conspecific and heterospecific neighbors surrounding a focal individual (or plot) can be measured within different distances (i.e., radii). Within a radius, neighbors should be weighted in the density summation with regard to their size and distance to the focal individuals, with the rationale that neighbors at a greater distance and of smaller size have smaller effects. Typically, weighting involves dividing the basal area or diameter by a linear function or an exponential function of distance to allow competitive effects of an individual of a given size to decline with distance [@uriarte2010trait; @canham2006neighborhood]. Biologically meaningful radii should be tested based on the size/life stage, vital rate of interest, and the likely agents of density dependence in the system. For a system of interest, we suggest comparing models with multiple distances (e.g., using maximum likelihood or information criterion) because forests may differ in how neighbors influence performance.:::Here, we demonstrate how to calculate the density of conspecific, heterospecific, and all neighbors surrounding a focal individual or plot for a given distance when XY coordinates are known. As a result, this section requires that the data set contains the location of mapped stems. If that is not available (as is common in seedling studies or smaller plots), continue to part 2 of this appendix.We then demonstrate how to weight the calculation of neighborhood density by individual size (*i.e.,* basal area) and distance using an exponential decay function (one of many possible decay functions), allowing the competitive effects of density effects to saturate [@uriarte2010trait; @canham2006neighborhood].To assess which shape parameters of the exponential decay function is most appropriate for the data set, we fit models with multiple combinations of decay function values and compare models using log likelihood.From a computational perspective, this approach can be relatively resource intensive both in terms of time and object size. It's possible to make this approach more efficient by subdividing the data (*e.g.,* by plot) or using more efficient data structures such as [data.table](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html).We also note that alternative approaches allow the estimation of the effective scale of neighborhood interactions directly from data [@barber2022bayesian]. An excellent case study using Stan is [available here](https://mc-stan.org/users/documentation/case-studies/plantInteractions.html).::: callout-noteThe following code is adapted from the [latitudinalCNDD repository](https://github.com/LisaHuelsmann/latitudinalCNDD/tree/main/code) by [Lisa Hülsmann](https://demographicecology.com/).:::## Load libraries and data```{r, message=FALSE}# Load librarieslibrary(tidyr)library(dplyr)library(ggplot2)library(parallel)library(here)library(spatstat.geom)library(mgcv) # For fitting gamslibrary(lubridate) # For calculating census intervalslibrary(broom) # For processing fit modelslibrary(purrr)library(kableExtra) # For table styling```## Data format explanationFor this tutorial, we will be using an example data set from Barro Colorado Island (BCI) ([available here](https://github.com/LisaHuelsmann/latitudinalCNDD/tree/main/data_prep/input/data_tree)) that includes 7,028 observations, of 3,771 individuals of 16 species across two census intervals with the 50 ha BCI 50 ha forest dynamics plot ([more information here](http://ctfs.si.edu/webatlas/datasets/bci/)). Each stem is individually mapped, which allows us to calculate neighborhood density across different distance thresholds.The code below assumes the data is in a format where each row is an observation for an individual from a census. For this data set, the column descriptions are as follows:- **treeID**: unique identifier for each tree- **sp**: species code- **gx**: spatial coordinate on x axis- **gy**: spatial coordinate on y axis- **dbh:** diameter at breast height (mm)- **ba:** basal area (m\^2)- **status:** status at census, A = alive, D = dead- **date:** date of observation- **census**: census number- **surv**: survival status at census, 1 = alive, 0 = dead- **surv_next:** survival status at next census, 1 = alive, 0 = dead- **mort**: mortality status at census, 1 = dead, 0 = alive- **mort_next**: mortality status at next census, 1 = dead, 0 = alive- **interval**: time interval between censuses in years```{r echo=FALSE}# This code is not shown in report# Load in and clean example BCI data # Loads in a list object called 'tree' with separated by censusload(here("./data/site01_tree.Rdata"))# Restructure data for mortality analysisfor (census in1:(length(tree)-1)) { tree[[census]]$surv =ifelse(grepl("A", tree[[census]]$status), 1, 0) # status at t=0 tree[[census]]$surv_next =ifelse(grepl("A", tree[[census+1]]$status), 1, 0) # status at t=1 tree[[census]]$mort =ifelse(grepl("A", tree[[census]]$status), 0, 1) # status at t=0 tree[[census]]$mort_next =ifelse(grepl("A", tree[[census+1]]$status), 0, 1) # status at t=1 tree[[census]]$interval =time_length(difftime(tree[[census+1]]$date , tree[[census]]$date), "years") # length of interval}# Bind together into single tibbledat <-tibble(bind_rows(tree[[1]] %>%mutate(census =1), tree[[2]] %>%mutate(census =2)) %>% dplyr::select(-c(homchange, nostems))) %>%filter(status %in%c("A", "D"))```Let's take a quick look at the data set we'll be working with:```{r}head(dat, n =5)```------------------------------------------------------------------------We can produce a plot @fig-map of the tree locations, where the size of the point is scaled to basal area and colored by species, with each census displayed as a panel:```{r, fig.width = 8}#| label: fig-map#| fig-cap: "Map of tree locations by census"ggplot(dat, aes(x = gx, y = gy, size = ba, col = sp)) +geom_point() +facet_wrap(~census) +theme_bw(10) +theme(legend.position ="right") +labs(size ="Basal area", col ="Species")```## Define exponential decay functionWe will demonstrate how to calculate neighborhood densities using an exponential decay function. In principle, it's possible to use any number of different decay functions and to explore various ranges of values that determine the rate of decay. Here, we present the general framework for how one might explore different scenarios, though the ultimate choice of decay function and parameter values will depend on the study. **Note that this method of calculating neighborhood density is only possible when neighbors' x-y coordinates are known.**```{r}exponential_decay <-function(mu, distance){return(exp(-(1/mu * distance)))}```Let's see what the exponential decay function looks like across a range of mu values (@fig-decay):```{r}#| label: fig-decay#| fig-cap: "Plot of decay function"# Set range of mu values and distancesdecay_values <-seq(from =1, to =25, by =2)# Use sprintf to add leading 0s, will help with sorting later ondecay_names =paste("exp", sprintf("%02s", decay_values), sep ="") distances <-seq(1, 100, 1)# Generate a dataframe with each combination of mu and distanceexample_decay_values <-expand_grid(decay_values, distances) %>%# Rename columnsrename(decay_value = decay_values, distance = distances)# Evaluate distance decay function for each combination of # mu and distanceexample_decay_values$decay <-exponential_decay(mu = example_decay_values$decay_value,distance = example_decay_values$distance)# Plot resultsggplot(example_decay_values, aes(x = distance, y = decay, color = decay_value, group = decay_value)) +ylab("Density") +xlab("Distance (m)") +scale_color_continuous(name ="Value of \ndecay constant") +geom_line() +theme_bw(12)```### Determine which trees are at edge of plotTrees near the edge of plot boundaries have incomplete information about their neighborhood, because trees outside of the plot boundaries are not mapped. Typically trees within certain distance threshold from the plot edge are excluded from analysis, but still included in calculations of neighborhood densities.We are going to add a column to our data set called 'edge' that is TRUE if within a set distance to the edge of the plot.For this example data set, the dimensions of the overall plot are 300 x 300 m, ranging from 0-300 on both the x and y axis, and representing a subset of the overall 50 ha forest dynamics plot at BCI.```{r}# Set threshold for distance to edgedistance_threshold_edge =30# Add in min and max values for corners of plotmin_x <-0max_x <-300min_y <-0max_y <-300dat$edge = dat$gx < min_x + distance_threshold_edge | dat$gx > max_x - distance_threshold_edge | dat$gy < min_y + distance_threshold_edge | dat$gy > max_y - distance_threshold_edge# How many trees fall within the edge threshold?table(dat$edge)```Below is a plot @fig-edge of tree locations colored by whether they fall within the edge threshold or not, separated out for each census.```{r}#| label: fig-edge#| fig-cap: "Map of tree locations colored by whether they fall within the edge threshold"ggplot(dat, aes(x = gx, y = gy, col = edge)) +geom_point() +facet_wrap(~census) +coord_fixed(ratio =1) +theme_bw(12)```## Calculate distances among focal individual and neighborsNext, we will calculate distances among individuals in our plot, using an upper threshold distance of what to consider a neighbor.We use the ['spatstat.geom' package](https://spatstat.org/) to efficiently calculate distances among individuals.Because this example only includes one census interval (two censuses total), we will subset down to just the first census to calculate neighborhood density.If the data set were to contain multiple census intervals, it would be necessary to calculate neighborhood density separately for each census interval, using only the individuals that were alive at the beginning of that census interval.```{r}# Set distance threshold for considering neighborsdistance_threshold_neighborhood <-30# Subset to first census - this will be different for different # datasetsdat_first_census <- dat %>%filter(census ==1)# Format into 'ppp' objectdat_ppp = spatstat.geom::ppp(dat_first_census$gx, dat_first_census$gy, window =owin(range(dat$gx), range(dat$gy)), checkdup = F)# Determine close pairs based on distance threshold# Returns a list that we convert to tibble laterneighbors = spatstat.geom::closepairs(dat_ppp, rmax = distance_threshold_neighborhood, # Max radiuswhat ="ijd", # return indicies i, j, and distance twice =TRUE)# Convert to dataframeneighbors <-as_tibble(neighbors)# Take a peek at the data# i = index of focal individual# j = index of neighbor# d = distance to neighborhead(neighbors)```Next we add in additional columns for neighbor characteristic, (*e.g.,* species, size) and whether they are located near the edge of the plot (blue area in @fig-edge).```{r}# add additional columns# Add species for individual ineighbors$sp_i = dat_first_census$sp[neighbors$i] # Add whether individual i is near edgeneighbors$edge_i = dat_first_census$edge[neighbors$i] # Add species for individual jneighbors$sp_j = dat_first_census$sp[neighbors$j] # Add basal area of individual jneighbors$ba_j = dat_first_census$ba[neighbors$j] ```We then want to add a column that indicates whether the comparison between the focal individual and the neighbor is conspecific or heterospecific because we are interested separately estimating the densities of conspecifics and heterospecifics.```{r}neighbors$comparison_type <-ifelse(neighbors$sp_i == neighbors$sp_j,yes ="con", # conspecificno ="het") # heterospecific```We then remove focal trees that are too close to the edge of the plot```{r}# remove focal trees i that are at the edgeneighbors = neighbors[!neighbors$edge_i, ]```Next, we add columns to our neighbors data set that indicates the distance decay multiplier and the distance decay multiplier weighted by basal area```{r}# Loop through distance decay valuesfor(x in1:length(decay_values)){# Add in column for distance decay multiplier for each decay value# add _ba suffix to column name - will eventually be summed based on# number of individual neighbors neighbors[, paste0(decay_names[x], "_N")] <-exponential_decay(mu = decay_values[x], distance = neighbors$d)# Weight distance decay multiplier by basal area of neighbor# add _ba suffix to column name neighbors[, paste0(decay_names[x], "_BA")] <-exponential_decay(mu = decay_values[x], distance = neighbors$d) * neighbors$ba_j}```Depending on how many distance decay values are being investigated, there may be many columns in the data frame.```{r}head(neighbors)```## Calculate neighborhood densityThen we summarize neighborhood density for each focal tree separately for conspecifics and heterospecifics.```{r, message=FALSE}# Simple calculations of number of neighbors and total basal area, # ignoring distance decayneighbors_summary <- neighbors %>%group_by(i, comparison_type) %>%summarise(nodecay_N =n(), # count of neighborsnodecay_BA =sum(ba_j)) # sum of basal area)# Add in decay columnsneighbors_summary_decay <- neighbors %>%group_by(i, comparison_type) %>%# Select only columns related to distance decayselect(starts_with("exp")) %>%# Summarize them all by summing columnssummarise_all(sum) # Join both togetherneighbors_summary <-left_join(neighbors_summary, neighbors_summary_decay,by =c("i", "comparison_type"))# Add treeID columnneighbors_summary$treeID<-dat_first_census$treeID[neighbors_summary$i]# If there are any focal individuals with no neighbors, add values # of 0 for neighborhood densities noNeighbors = dat_first_census$treeID[!dat_first_census$treeID %in% neighbors_summary$treeID &!dat_first_census$edge]# If there are individuals with no neighborsif (length(noNeighbors) >0) { neighbors_summary =bind_rows(neighbors_summary, expand_grid(i =NA, treeID = noNeighbors, comparison_type =c("het", "cons"))) %>%# Add 0s where NAmutate_all(replace_na, replace =0) }# Take a peak at the datahead(neighbors_summary)```As described in the main text, it can be advantageous to use total density that includes both conspecific and heterospecific density as a covariate, rather than only heterospecific density.Here, we calculate overall density by summing heterospecific and conspecific densities.```{r, message=FALSE}# First convert to long format which will make it easy to sum across # heterospecific and conspecific valuesneighbors_summary_long_format <- neighbors_summary %>%pivot_longer(cols =-c("i", "comparison_type", "treeID"))# Sum across heterospecific and conspecific values and rename to 'all'neighbors_total_long_format <- neighbors_summary_long_format %>%group_by(i, treeID, name) %>%summarize(value =sum(value)) %>%mutate(comparison_type ="all")# Bind together conspecific and 'all' densities# remove heterospecific columns# fill in 0s where there are no neighborsneighbors_summary_total =bind_rows(neighbors_summary_long_format, neighbors_total_long_format) %>%# Can filter out heterospecific neighborhood # values by uncommenting this line of the# objects become too large# filter(comparison_type != "het") %>% mutate(name =paste0(comparison_type, "_", name)) %>%select(-comparison_type) %>%pivot_wider(names_from = name, values_from = value, values_fill =0)```## Model mortality as a function of neighborhood densityTo determine the 'best' decay parameter to use, we fit species-specific mortality models using Generalized Additive Models [GAMs](https://noamross.github.io/gams-in-r-course/) and compare models based on log likelihoods. GAMs are flexible statistical models that combine linear and non-linear components to capture complex relationships in data.We first create our data set that we will use in the GAMs, subsetting down to just one census interval (because our example dataset only has 2 censuses) and removing trees close to the edge. In other datasets, you may have multiple census intervals, where it would be common practice to include 'year' or 'census' as a random effect in the model, but otherwise the overall approach is similar.```{r, message=FALSE}# Join census data with neighborhood datadat_gam <-left_join(dat_first_census, neighbors_summary_total,by ="treeID")# Remove edge treesdat_gam <- dat_gam %>%filter(edge ==FALSE)```For each species, we summarize data availability to help determine parameters for the GAM smooth terms.```{r}# Summarize data availability at species level to set the degree of # smoothness for GAM smooth termssp_summary <- dat_gam %>%group_by(sp) %>%summarise(ndead =sum(mort_next),nsurv =sum(surv_next),range_con_BA =max(con_nodecay_BA) -min(con_nodecay_BA),max_con_BA =max(con_nodecay_BA),unique_con_BA =length(unique(con_nodecay_BA)),unique_all_BA =length(unique(all_nodecay_BA)),range_con_N =max(con_nodecay_N) -min(con_nodecay_N),max_con_N =max(con_nodecay_N),unique_con_N =length(unique(con_nodecay_N)),unique_all_N =length(unique(all_nodecay_N)),unique_dbh =length(unique(dbh)) )# Filter out species that have 0 or 100% mortalitysp_summary <- sp_summary %>%filter(ndead >0) %>%filter(nsurv >0)```In this long block of code, we loop over all possible combinations of decay values for neighborhood densities weighted by abundance (N) and size (BA) for each species and fit a separate GAM for each model. For each GAM, we assess whether the model was able to be fit, and if it was able to be fit, whether it converged and for potential warnings. We save the results of successful model fits into a list that we will process later.For large datasets where individual GAMs take a long time to run, the code could be modified to run in parallel, either locally on a personal computer or across a computing cluster.```{r, message=FALSE, warning=FALSE}# Initialize list that will save model outputsres_mod <-list()# Model run settingsrun_settings <-expand_grid(species =unique(sp_summary$sp),decay_con =c("nodecay", decay_names),decay_total =c("nodecay", decay_names),nhood_data_type =c("N", "BA"))# Loop through model run settingsfor(run_settings_row in1:nrow(run_settings)){# Extract values from run settings dataframe species <- run_settings$species[run_settings_row] decay_con <- run_settings$decay_con[run_settings_row] decay_total <- run_settings$decay_total[run_settings_row] nhood_data_type <- run_settings$nhood_data_type[run_settings_row]# Subset down to just focal species dat_subset <- dat_gam %>%filter(sp == species)# Set run name run_name <-paste0(species, "_total", decay_total,"_con", decay_con, "_", nhood_data_type)# Print status if desired# cat("Working on run: ", run_name, " ...\n")# Create model formula# Initial DBH included as predictor variable form =paste0("mort_next ~ s(dbh, k = k1) + s(all_", decay_total, "_", nhood_data_type, ", k = k2) + s(con_", decay_con, "_", nhood_data_type, ", k = k3)")# Convert into formula form <-as.formula(form)# Choose penalties for model fitting# set to default 10 (the same as -1)# The higher the value of k, the more flexible the smooth term becomes, allowing for more intricate and wiggly patterns. Conversely, lower values of k result in smoother and simpler representations. k1 = k2 = k3 =10if (k1 > sp_summary$unique_dbh[sp_summary$sp == species]) { k1 = sp_summary$unique_dbh[sp_summary$sp == species] -2 }if (k2 > sp_summary$unique_all_N[sp_summary$sp == species]) { k2 = sp_summary$unique_all_N [sp_summary$sp == species]-2 }if (k3 > sp_summary$unique_con_N[sp_summary$sp == species]) { k3 = sp_summary$unique_con_N[sp_summary$sp == species] -2 }# Fit model# wrap in a try function to catch any errors mod =try(gam(form,family =binomial(link=cloglog),offset =log(interval),data = dat_subset,method ="REML"), silent = T)# Check if model was able to fitif (!any(class(mod) =="gam")) {# print(paste("gam failed for:", run_name)) } else {# Check if gam convergedif (!mod$converged) {# print(paste("no convergence for:", run_name)) } else {# check for complete separation# https://stats.stackexchange.com/questions/336424/issue-with-complete-separation-in-logistic-regression-in-r# Explore warning "glm.fit: fitted probabilities numerically 0 # or 1 occurred" eps <-10* .Machine$double.eps glm0.resids <-augment(x = mod) %>%mutate(p =1/ (1+exp(-.fitted)),warning = p >1-eps,influence =order(.hat, decreasing = T)) infl_limit =round(nrow(glm0.resids)/10, 0)# check if none of the warnings is among the 10% most # influential observations, than it is okay.. num =any(glm0.resids$warning & glm0.resids$influence < infl_limit)# complete separationif (num) {# print(paste("complete separation is likely for:", run_name)) } else {# missing Vcif (is.null(mod$Vc)) {# print(paste("Vc not available for:", run_name)) } else {# Add resulting model to list if it passes all checks res_mod[[run_name]] <- mod } # Vc ifelse } # complete separation ifelse } # convergence ifelse } # model available ifelse} # end run settings loop```## Summarize model fitsNext, we will extract regression coefficients into a dataframe using broom::tidy()```{r}# Extract coefficients for each model into a listcoefs =lapply(res_mod, broom::tidy) # Add a column for model run to each object in the listcoefs =Map(cbind, coefs, run_name =names(coefs)) coefs =do.call(rbind, coefs) # Bind elements of list together by rowsrownames(coefs) <-NULL# Remove row namescoefs <- coefs %>%select(run_name, everything()) # Rearrange columns# Take a look at the dataknitr::kable(head(coefs), digits =2, booktabs = T) %>%kable_styling(latex_options =c("striped", "scale_down"))```Next we will extract model summaries for each model with broom::glance() that provides key information like degrees of freedom, log likelihood, AIC, etc.```{r}# Extract summaries for each model into a listsums =lapply(res_mod, broom::glance) # Add a column for model run to each object in the listsums =Map(cbind, sums, run_name =names(sums)) sums =do.call(rbind, sums) # Bind elements of list together by rowsrownames(sums) <-NULL# Remove row names# Separate run name into columns for species, decay, and density typesums <- sums %>%separate(run_name, into =c("sp", "decay_total", "decay_con", "density_type"), remove =FALSE)# Remove 'total' and 'con' from decay columnssums$decay_total <-gsub("total", "", sums$decay_total)sums$decay_con <-gsub("con", "", sums$decay_con)# Rearrange columns sums <- sums %>%select(run_name, sp, decay_total, decay_con, density_type, everything()) # Take a look at the model summariesknitr::kable(head(sums), digits =2) %>%kable_styling(latex_options =c("striped", "scale_down"))```Due to limited sample sizes, it is likely that GAMs will not fit for each species. For example, in the table below of summary data of species where models did not successfully fit/converge, there are several instances of where all individuals of the species survived during the census, leading to no mortality events to use in the model.\\We need to exclude species without complete model runs from our overall calculations when looking for optimal decay parameters across the entire data set.```{r}# Tally up number of model runs by species and total decay valuestable(sums$sp, sums$decay_total)# get incomplete run-site-species combinationsrun_counts_by_sp <- sums %>%group_by(sp) %>%tally() %>%# Join with overall species listleft_join(sp_summary %>%select(sp), ., by ="sp") # Get expected number of runs if all models workexpected_total_runs <- run_settings %>%group_by(species) %>%tally() %>%pull(n) %>%max()# Save species names where they didn't have all expected combinations # of model runsincomplete = run_counts_by_sp$sp[run_counts_by_sp$n < expected_total_runs |is.na(run_counts_by_sp$n)]# Species with successful runsknitr::kable(sp_summary[!sp_summary$sp %in% incomplete, ], digits =2) %>%kable_styling(latex_options =c("striped", "scale_down"))# Species without successful runsknitr::kable(sp_summary[sp_summary$sp %in% incomplete, ], digits =2) %>%kable_styling(latex_options =c("striped", "scale_down"))```## Selecting optimum decay parameter values across all speciesWe then summarize different model criteria across all species runs. To look for the optimal value for decay parameters, we sum log likelihoods across all species for a given decay parameter combination and choose the resulting parameter combination with the highest summed log likelihood.```{r, message=FALSE}sums_total <- sums %>%filter(!sp %in% incomplete) %>%group_by(decay_total, decay_con, density_type) %>%summarise(nvalues =n(),sumlogLik =sum(logLik),meanlogLik =mean(logLik)) %>%arrange(decay_total, decay_con, density_type)sums_total```We create a heatmap plot @fig-optim of summed log likelihoods for all parameter combinations across all species, with the optimal parameter combination marked with an X```{r}#| label: fig-optim#| fig-cap: "Heatmap of optimal values for decay constants"# Find optimum value separately for N and BA optimum <- sums_total %>%group_by(density_type) %>%slice_max(sumlogLik)# Plot heatmap of log likelihood valuesggplot(sums_total, aes(x = decay_total, y = decay_con, fill = sumlogLik)) +geom_tile(width =0.9, height =0.9, col ="black") +scale_fill_viridis_c() +# viridis color palettegeom_label(data = optimum, label ="X") +labs(x ="Decay total density", y ="Decay conspecific density", fill ="sumlogLik") +facet_wrap(~density_type, ncol =1) +theme_bw(12) +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))```For this data set, the following are the optimal decay parameter values across all species separately for neighborhood density calculated by abundance (N) and by basal area (BA)```{r}knitr::kable(optimum, digits =2) %>%kable_styling(latex_options =c("striped", "scale_down"))```## Selecting optimum decay parameter values separately for each speciesAlternatively, it may be useful to determine optimum decay parameters on a species by species basis. To do this, we find the maximum log liklihood for each species separately across decay parameter combination and choose the resulting parameter combination with the highest log likelihood.```{r, message=FALSE}sums_total_by_spp <- sums %>%filter(!sp %in% incomplete) %>%group_by(decay_total, decay_con, density_type, sp) %>%summarise(nvalues =n(),sumlogLik =sum(logLik),meanlogLik =mean(logLik)) %>%arrange(decay_total, decay_con, density_type)sums_total_by_spp```We create a heatmap plot @fig-optim-spp of log likelihoods for all parameter combinations for each species separately, with the optimal parameter combination marked with an X. We only display the first three species here, because with data sets containing many species, it will be hard to visualize all the species on one graph and you may need to subdivide the plot further for visualization.```{r fig.height = 10, fig.width=8}#| label: fig-optim-spp#| fig-cap: "Heatmap of optimal values for decay constants separately for each species" sums <- sums %>%filter(sp %in%c("cappfr", "cordbi", "des2pa")) %>%group_by(sp) %>%# Scale loglikihood by species to help with visualizationmutate(logLik_scaled =scale(logLik)) %>%ungroup()# Find optimum value separately for N and BA optimum_by_sp <- sums %>%group_by(sp, density_type) %>%slice_max(logLik_scaled, with_ties =FALSE)# Plot heatmap of log likelihood valuesggplot(sums, aes(x = decay_total, y = decay_con,fill = logLik_scaled)) +geom_tile(width =0.9, height =0.9, col ="black") +geom_label(data = optimum_by_sp, label ="X") +labs(x ="Decay total density", y ="Decay conspecific density", fill ="logLik") +facet_wrap(~sp + density_type, ncol =2, scales ="free") +scale_fill_viridis_c() +# viridis color palettetheme_bw(12) +theme(legend.position ="right") +labs(fill ="Log likelihood\n(scaled)") +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))```For this data set, the following are the optimal decay parameter values for each species separately for neighborhood density calculated by abundance (N) and by basal area (BA):```{r}knitr::kable(optimum_by_sp, digits =2) %>%kable_styling(latex_options =c("striped", "scale_down"))```